1,217 research outputs found
The impact of the Lombard effect on audio and visual speech recognition systems
When producing speech in noisy backgrounds talkers reflexively adapt their speaking style in ways that increase speech-in-noise intelligibility. This adaptation, known as the Lombard effect, is likely to have an adverse effect on the performance of automatic speech recognition systems that have not been designed to anticipate it. However, previous studies of this impact have used very small amounts of data and recognition systems that lack modern adaptation strategies. This paper aims to rectify this by using a new audio-visual Lombard corpus containing speech from 54 different speakers – significantly larger than any previously available – and modern state-of-the-art speech recognition techniques.
The paper is organised as three speech-in-noise recognition studies. The first examines the case in which a system is presented with Lombard speech having been exclusively trained on normal speech. It was found that the Lombard mismatch caused a significant decrease in performance even if the level of the Lombard speech was normalised to match the level of normal speech. However, the size of the mismatch was highly speaker-dependent thus explaining conflicting results presented in previous smaller studies. The second study compares systems trained in matched conditions (i.e., training and testing with the same speaking style). Here the Lombard speech affords a large increase in recognition performance. Part of this is due to the greater energy leading to a reduction in noise masking, but performance improvements persist even after the effect of signal-to-noise level difference is compensated. An analysis across speakers shows that the Lombard speech energy is spectro-temporally distributed in a way that reduces energetic masking, and this reduction in masking is associated with an increase in recognition performance. The final study repeats the first two using a recognition system training on visual speech. In the visual domain, performance differences are not confounded by differences in noise masking. It was found that in matched-conditions Lombard speech supports better recognition performance than normal speech. The benefit was consistently present across all speakers but to a varying degree. Surprisingly, the Lombard benefit was observed to a small degree even when training on mismatched non-Lombard visual speech, i.e., the increased clarity of the Lombard speech outweighed the impact of the mismatch.
The paper presents two generally applicable conclusions: i) systems that are designed to operate in noise will benefit from being trained on well-matched Lombard speech data, ii) the results of speech recognition evaluations that employ artificial speech and noise mixing need to be treated with caution: they are overly-optimistic to the extent that they ignore a significant source of mismatch but at the same time overly-pessimistic in that they do not anticipate the potential increased intelligibility of the Lombard speaking style
Exploring the use of group delay for generalised VTS based noise compensation
In earlier work we studied the effect of statistical normalisation
for phase-based features and observed it leads to a significant
robustness improvement. This paper explores the extension
of the generalised Vector Taylor Series (gVTS) noise
compensation approach to the group delay (GD) domain. We
discuss the problems it presents, propose some solutions and
derive the corresponding formulae. Furthermore, the effects
of additive and channel noise in the GD domain were studied.
It was observed that the GD of the noisy observation is a convex
combination of the GDs of the clean signal and the additive
noise and also in the expected sense, channel GD tends to
zero. Experiments on Aurora-4 showed that, despite training
only on the clean speech, the proposed features provide average
WER reductions of 0.8% absolute and 4.1% relative compared
to an MFCC-based system trained on the multi-style
data. Combining the gVTS with a bottleneck DNN-based system
led to average absolute (relative) WER improvements of
6.0% (23.5%) when training on clean data and 2.5% (13.8%)
when using multi-style training with additive noise
Glassy dynamics in granular compaction: sand on random graphs
We discuss the use of a ferromagnetic spin model on a random graph to model
granular compaction. A multi-spin interaction is used to capture the
competition between local and global satisfaction of constraints characteristic
for geometric frustration. We define an athermal dynamics designed to model
repeated taps of a given strength. Amplitude cycling and the effect of
permanently constraining a subset of the spins at a given amplitude is
discussed. Finally we check the validity of Edwards' hypothesis for the
athermal tapping dynamics.Comment: 13 pages Revtex, minor changes, to appear in PR
An Equation of State of a Carbon-Fibre Epoxy Composite under Shock Loading
An anisotropic equation of state (EOS) is proposed for the accurate
extrapolation of high-pressure shock Hugoniot (anisotropic and isotropic)
states to other thermodynamic (anisotropic and isotropic) states for a shocked
carbon-fibre epoxy composite (CFC) of any symmetry. The proposed EOS, using a
generalised decomposition of a stress tensor [Int. J. Plasticity \textbf{24},
140 (2008)], represents a mathematical and physical generalisation of the
Mie-Gr\"{u}neisen EOS for isotropic material and reduces to this equation in
the limit of isotropy. Although a linear relation between the generalised
anisotropic bulk shock velocity and particle velocity was
adequate in the through-thickness orientation, damage softening process
produces discontinuities both in value and slope in the -
relation. Therefore, the two-wave structure (non-linear anisotropic and
isotropic elastic waves) that accompanies damage softening process was proposed
for describing CFC behaviour under shock loading. The linear relationship
- over the range of measurements corresponding to non-linear
anisotropic elastic wave shows a value of (the intercept of the
- curve) that is in the range between first and second
generalised anisotropic bulk speed of sound [Eur. Phys. J. B \textbf{64}, 159
(2008)]. An analytical calculation showed that Hugoniot Stress Levels (HELs) in
different directions for a CFC composite subject to the two-wave structure
(non-linear anisotropic elastic and isotropic elastic waves) agree with
experimental measurements at low and at high shock intensities. The results are
presented, discussed and future studies are outlined.Comment: 12 pages, 9 figure
Spectral properties of zero temperature dynamics in a model of a compacting granular column
The compacting of a column of grains has been studied using a one-dimensional
Ising model with long range directed interactions in which down and up spins
represent orientations of the grain having or not having an associated void.
When the column is not shaken (zero 'temperature') the motion becomes highly
constrained and under most circumstances we find that the generator of the
stochastic dynamics assumes an unusual form: many eigenvalues become
degenerate, but the associated multi-dimensional invariant spaces have but a
single eigenvector. There is no spectral expansion and a Jordan form must be
used. Many properties of the dynamics are established here analytically; some
are not. General issues associated with the Jordan form are also taken up.Comment: 34 pages, 4 figures, 3 table
A two-species model of a two-dimensional sandpile surface: a case of asymptotic roughening
We present and analyze a model of an evolving sandpile surface in (2 + 1)
dimensions where the dynamics of mobile grains ({\rho}(x, t)) and immobile
clusters (h(x, t)) are coupled. Our coupling models the situation where the
sandpile is flat on average, so that there is no bias due to gravity. We find
anomalous scaling: the expected logarithmic smoothing at short length and time
scales gives way to roughening in the asymptotic limit, where novel and
non-trivial exponents are found.Comment: 7 Pages, 6 Figures; Granular Matter, 2012 (Online
Archaeology and Desertification in the Wadi Faynan: the Fourth (1999) Season of the Wadi Faynan Landscape Survey
Reproduced with permission of the publisher. © 2000 Council for British Research in the Levant. Details of the publication are available at: http://www.cbrl.org.uk/Publications/publications_default.shtmThis report describes the fourth season of fieldwork by an interdisciplinary team of archaeologists and geographers working together to reconstruct the landscape history of the Wadi Faynan in southern Jordan. The particular focus of the project is the long-term history of inter-relationships between landscape and people, as a contribution to the study of processes of desertification and environmental degradation. The 1999 fieldwork contributed significantly towards the five
Objectives defined for the final two field seasons of the project in 1999 and 2000: to map the archaeology outside the ancient field systems flooring the wadi that have formed the principal focus of the archaeological survey in the previous seasons; to use ethnoarchaeological studies both to reconstruct modern and recent land use and also to yield archaeological signatures of land use to
inform the analysis of the survey data; to complete the survey of ancient field systems and refine understanding of when and how they functioned; to complete the programme of geomorphological and palaeoecological fieldwork, and in particular to refine the chronology of climatic change and human impacts; and to complete the recording and classification of finds
Asymptotic normalization coefficients for 8B->7Be+p from a study of 8Li->7Li+n
Asymptotic normalization coefficients (ANCs) for 8Li->7Li+n have been
extracted from the neutron transfer reaction 13C(7Li,8Li)12C at 63 MeV. These
are related to the ANCs in 8B->7Be+p using charge symmetry. We extract ANCs for
8B that are in very good agreement with those inferred from proton transfer and
breakup experiments. We have also separated the contributions from the p_1/2
and p_3/2 components in the transfer. We find the astrophysical factor for the
7Be(p,gamma)8B reaction to be S_17(0)=17.6+/-1.7 eVb. This is the first time
that the rate of a direct capture reaction of astrophysical interest has been
determined through a measurement of the ANCs in the mirror system.Comment: 5 pages, 3 figures, 2 table
- …